Efficient Multidimensional Searching Routines

نویسنده

  • Joshua D. Reiss
چکیده

The problem of Music Information Retrieval can often be formalized as “searching for multidimensional trajectories”. It is well known that string-matching techniques provide robust and effective theoretic solutions to this problem. However, for low dimensional searches, especially queries concerning a single vector as opposed to a series of vectors, there are a wide variety of other methods available. In this work we examine and benchmark those methods and attempt to determine if they may be useful in the field of information retrieval. Notably, we propose the use of KD-Trees for multidimensional nearneighbor searching. We show that a KD-Tree is optimized for multidimensional data, and is preferred over other methods that have been suggested, such as the K-Tree, the box-assisted sort and the multidimensional quick-sort. 1. MULTIDIMENSIONAL SEARCHING IN MUSIC IR The generic task in Music IR is to search for a query pattern, either a few seconds of raw acoustic data, or some type of symbolic file (such as MIDI), in a database of the same format. To perform this task, we have to encode the files in a convenient way. If the files are raw acoustic data, we often resort to a feature extraction (fig. 1). The files are cut into M time frames and for each frame, we apply a signal-processing transform that outputs a vector of n features (e.g. psychoacoustics parameters such as pitch, loudness, brightness, etc...). If the data is symbolic, we similarly encode each symbol (e.g. each note, suppose there are M of them) with an ndimensional vector (e.g. pitch, duration). In both cases, the files in the database are turned into a trajectory of M vectors of dimension n. Figure 1Feature extraction Within this framework, two search strategies can be considered: String-matching techniques try to align two vector sequences of length M m , ( (1), (2),... ( )) x x x m and ( (1), (2),... ( )) y y y m using a set of elementary operations (substitutions, insertions...). They have received much coverage in the Music IR community (see for example [1]) since they allow a contextdependent measure of similarity and thus can account for many of the high-level specificities of a musical query (i.e., replacing a note by its octave shouldn’t be a mismatch). They are robust and relatively fast. Another approach would be to “fold” the trajectories of m vectors of dimension n into embedded vectors of higher dimension N m n = ⋅ . For example, with m=3 and n=2: ( ) ( ) 1 2 1 2 1 2 (1), (2),.. ( ) (1), (1), (2), (2), (3), (3) x x x m x x x x x x = The search problem now consists of identifying the nearest vector in a multidimensional data set (i.e., the database) to some specified vector (i.e., the query). This approach may seem awkward, because We lose structure in the data that could be used to help the search routines (e.g., knowledge that 1(1) x and 1(2) x are coordinates of the same “kind”). We increase the dimensionality of the search. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. However, there has been a considerable amount of work in devising very efficient searching and sorting routines for such multidimensional data. A complete review of the multidimensional data structures that might be required is described by Samet, et al. [2,3]. Non-hierarchical methods, such as the use of grid files [4] and extendable hashing [5], have been applied to multidimensional searching and analyzed extensively. In many areas of research, the KD-Tree has become accepted as one of the most efficient and versatile methods of searching. This and other techniques have been studied in great detail throughout the field of computational geometry [6,7]. Therefore, we feel that Music IR should capitalize on these well-established techniques. It is our hope that we can shed some light on the beneficial uses of KD-Trees in this field, and how the multi-dimensional framework can be adapted to the peculiarities of music data. The paper is organized as follows. In the next four sections, we review four multidimensional searching routines: The KD-Tree, the K-Tree, the Multidimensional Quick-sort, which is an original algorithm proposed by the authors, and the Box-Assisted Method. Discussion of each of these methods assumes that the data consists of M N-dimensional vectors, regardless of what each dimension represents or how the vectors were created or extracted. We then benchmark and compare these routines, with an emphasis on the very efficient KD-Tree algorithm. Finally, we examine some properties of these algorithms as regards a multidimensional approach to Music IR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Multidimensional Searching Routines for Music Information Retrieval

The problem of Music Information Retrieval can often be formalized as “searching for multidimensional trajectories”. It is well known that string-matching techniques provide robust and effective theoretic solutions to this problem. However, for low dimensional searches, especially queries concerning a single vector as opposed to a series of vectors, there are a wide variety of other methods ava...

متن کامل

Graph Search of Software Models Using Multidimensional Scaling

Software models formalize the requirements, structure and behavior of a system or application. They represent essential artifacts that simplify the process of software development. Software repositories have been developed to store models in order to facilitate the reuse of know-how from software projects; however, methods for searching these model repositories are not very efficient. Specifica...

متن کامل

An efficient shock - capturing central - type scheme for multidimensional relativistic flows I . Hydrodynamics

Multidimensional shock-capturing numerical schemes for special relativistic hydrodynamics (RHD) are computationally more expensive than their correspondent Euler versions, due to the nonlinear relations between conservative and primitive variables and to the consequent complexity of the Jacobian matrices (needed for the spectral decomposition in most of the approximate Riemann solvers of common...

متن کامل

A comparative study of multiple attribute tree and inverted file structures for large bibliographic files

A variety of data structures such as inverted file, multi-lists, quad tree, k-d tree. range tree. polygon tree, quintary tree. multidimensional tries, segment tree. doubly chained tree. the grid file. d-fold tree, super B-tree, Multiple .4ttribute Tree (MAT). etc. have been studied for multidimensional searching and related problems. Physical data base organization, which is an important applic...

متن کامل

Application of Threshold-accepting to the Evaluation of the Discrepancy of a Set of Points

Efficient routines for multidimensional numerical integration are provided by quasi– Monte Carlo methods. These methods are based on evaluating the integrand at a set of representative points of the integration area. A set may be called representative if it shows a low discrepancy. However, in dimensions higher than two and for a large number of points the evaluation of discrepancy becomes infe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001